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ABSTRACT: Epidermal growth factor receptor (EGFR) protein tyrosine kinases (PTKs) 
are known for its role in cancer. Quinazoline have been reported to be the molecules 
of interest, with potent anticancer activity and they act by binding to ATP site of pro- 
tein kinases. ATP binding site of protein kinases provides an extensive opportunity 
to design newer analogs. With this background, we report an attempt to discern the 
structural and physicochemical requirements for inhibition of EGFR tyrosine kinase. 
The k-Nearest Neighbor Molecular Field Analysis (kNN-MFA), a three dimensional 
quantitative structure activity relationship (3D- QSAR) method has been used in the 
present case to study the correlation between the molecular properties and the tyro- 
sine kinase (EGFR) inhibitory activities on a series of quinazoline derivatives. kNN- 
MFA calculations for both electrostatic and steric field were carried out. The master 
grid maps derived from the best model has been used to display the contribution of 
electrostatic potential and steric field. The statistical results showed significant cor- 
relation coefficient r^ (q^) of 0.846, r^ for external test set (pred_r^) 0.8029, coefficient 
of correlation of predicted data set (pred_r^se) of 0.6658, degree of freedom 89 and k 
nearest neighbor of 2. Therefore, this study not only casts light on binding mechanism 
between EGFR and its inhibitors, but also provides hints for the design of new EGFR 
inhibitors with observable structural diversity. 
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INTRODUCTION 

Many of the tyrosine kinase enzymes are 
involved in cellular signaling pathways 
and regulate key cell functions such 
as proliferation, differentiation, anti-apoptotic 
signaling and neurite outgrowth* Unregulated 
activation of these enzymes, through mechanisms 
such as point mutations or over expression, can 
lead to a large percentage of clinical cancers [1, 
2]. The importance of tyrosine kinase enzymes in 
health and disease is further underscored by the 
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existence of aberrations in tyrosine kinase enzymes 
signaling occurring in inflammatory diseases and 
diabetes* Inhibitors of tyrosine kinase as a new kind 
of effective anticancer drug are important mediators 
of cellular signal transduction that affects growth 
factors and oncogenes on cell proliferation [3, 4]* 
The development of tyrosine kinase inhibitors 
has therefore become an active area of research 
in pharmaceutical science* Epidermal growth 
factor receptor (EGFR) which plays a vital role as 
a regulator of cell growth is one of the intensely 
studied tyrosine kinase targets of inhibitors* EGFR is 
overexpressed in numerous tumors, including those 
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derived from brain, lung, bladder, colon, breast, 
head and necL EGFR hyper activation has also been 
implicated in other diseases including polycystic 
kidney disease, psoriasis and asthma [5-7]* Since 
the hyper activation of EGFR has been associated 
with these diseases, inhibitor of EGFR has potential 
therapeutic value and it has been extensively studied 
in the pharmaceutical industry. 

One could not, however, confirm that the com- 
pounds designed would always possess good in- 
hibitory activity to EGFR, while experimental 
assessments ofinhibitory activity of these compounds 
are time-consuming and expensive* Consequently, 
it is of interest to develop a prediction method for 
biological activities before the synthesis* Quantita- 
tive structure activity relationship (QSAR) searches 
information relating chemical structure to biological 
and other activities by developing a QSAR model 
Using such an approach one could predict the activ- 
ities of newly designed compounds before a decision 
is being made whether these compounds should be 
really synthesized and tested* 

Many different approaches to QSAR have been 
developed over the years* The rapid increase in three- 
dimensional structural information (3D) of bio- 
organic molecules, coupled with the development of 
fast methods for 3D structure alignment (e*g* active 
analogue approach), has led to the development of 
3D structural descriptors and associated 3D QSAR 
methods* The most popular 3D QSAR methods 
are comparative molecular field analysis (CoMFA) 
and comparative molecular similarity analysis 
(CoMSIA) [8, 9]* The CoMFA method involves 
generation of a common three dimensional lattice 
around a set of molecules and calculation of the steric 
and electrostatic interaction energies at the lattice 
points* The interaction energies are numerically very 
high when a lattice point is very close to an atom 
and special care needs to be taken in order to avoid 
problems arising because of this* The CoMSIA 
method avoids these problems by using similarity 
function represented as Gaussian* This information 
around the molecule is converted into numerical 
data using the partial least squares (PLS) method 
that reduces the dimensionality of data by generating 
components* However, a major disadvantage is that 
PLS attempts to fit a linear curve among all the points 
in the data set* Further, the PLS method does not 
offer scope for improvement in results* It has been 
observed from several reports that the predictive 
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ability of PLS method is rather poor due to fitting 
of a linear curve between the available points* In the 
case of the CoMSIA method, molecular similarity 
is evaluated and used instead of molecular field, 
followed by PLS analysis* 

Variable selection methods have also been 
adopted for optimal region selection in 3D QSAR 
methods and shown to provide improved QSAR 
models as compared to the original CoMFA 
technique* For example, GOLPE was developed 
using chemometric principles, and q2-GRS was 
developed on the basis of independent analyses of 
small areas (or regions) of near molecular space 
to address the issue of optimal region selection 
in CoMFA [10, 11]* These considerations 
provide an impetus for the development of fast, 
generally nonlinear, variable selection methods 
for performing molecular field analysis* With the 
above facts and in continuation of our research for 
newer anti-cancer agent [12, 13] in the present 
study, we report here the development of a new 
method (kNN-MFA) that adopts a k-nearest 
neighbor principle for generating relationships of 
molecular fields with the experimentally reported 
activity to provide further insight into the key 
structural features required to design potential drug 
candidates of this class* This method utilises the 
active analogue principle that lies at the foundation 
of medicinal chemistry* 

COMPUTATIONAL METHODS 

A. Methodology 

We hereby report the models, as generated by 
kNN-MFA in conjunction with stepwise (SW) 
forward-backward variable selection methods* In 
the kNN-MFA method, several models were gen- 
erated for the selected members of training and 
test sets, and the corresponding best models are 
reported herein* VLife Molecular Design Suite 
(VLifeMDS), allows user to choose probe, grid 
si%e, and grid interval for the generation of descrip- 
tors* The variable selection methods along with the 
corresponding parameters are allowed to be chosen, 
and optimum models are generated by maximiz- 
ing q2* k-nearest neighbor molecular field analysis 
(kNN-MFA) requires suitable alignment of given 
set of molecules* This is followed by generation of 
a common rectangular grid around the molecules* 
The steric and electrostatic interaction energies are 
computed at the lattice points of the grid using a 
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methyl probe of charge These interaction en- 
ergy values are considered for relationship genera- 
tion and utilised as descriptors to decide nearness 
between molecules. The term descriptor is utilised 
in the following discussion to indicate field values 
at the lattice points. The optimal training and test 
sets were generated using the sphere exclusion algo- 
rithm [14]. This algorithm allows the construction 
of training sets covering descriptor space occupied 
by representative points. Once the training and 
test sets were generated, kNN methodology was 
applied to the descriptors generated over the grid. 

1. Nearest Neighbor (kNN) Method 

The kNN methodology relies on a simple distance 
learning approach whereby an unknown member is 
classified according to the majority of its k-nearest 
neighbors in the training set. The nearness is 
measured by an appropriate distance metric (e.g., 
a molecular similarity measure calculated using 
field interactions of molecular structures). The 
standard kNN method is implemented simply as 
follows: Calculate distances between an unknown 
object (u) and all the objects in the training set; 
select k objects from the training set most similar 
to object u, according to the calculated distances; 
and classify object u with the group to which the 
majority of the k objects belongs. An optimal 
k value is selected by optimisation through the 
classification of a test set of samples or by leave- 
one out cross-validation [15]. 

2. kNN-MFA with Simulated Annealing 

Simulated annealing (SA) is the simulation of 
a physical process, annealing, which involves 
heating the system to a high temperature and then 
gradually cooling it down to a preset temperature 
(e.g., room temperature). During this process, the 
system samples possible configurations distributed 
according to the Boltsmann distribution so that 
at equilibrium, low energy states are the most 
populated. 

3. kNN-MFA with Stepwise (SW) Variable 
Selection 

This method employs a stepwise variable selection 
procedure combined with kNN to optimise the 
number of nearest neighbors (k) and the selection 
of variables from the original pool as described in 
simulated annealing. 
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4. kNN-MFA with Genetic Algorithm 

Genetic algorithms (G A) first described by Holland 
[16] mimic natural evolution and selection. In 
biological systems, genetic information that 
determines the individuality of an organism is stored 
in chromosomes. Chromosomes are replicated and 
passed onto the next generation with selection 
criteria depending on fitness. 

B. Chemical Data 

One hundred twenty six quinazoline derivatives 
as tyrosine kinase (EGFR) inhibitors were taken 
from the literature and used for kNN-MFA 
analysis [17-28]. The above reported quinazoline 
derivatives showed wide variation in their structure 
and potency profiles. kNN-MFA (3DQSAR) 
models were generated for these derivatives 
using a training set of 98 molecules. Predictive 
power of the resulting models was evaluated 
by a test set of 28 molecules with uniformly 
distributed biological activities. Selection of 
test set molecules was made by considering the 
fact that test set molecules represent structural 
features similar to compounds in the training set 
[29]. The structures of all compounds along with 
their actual and predicted biological activities are 
shown in Table l(A-Y). 

Table 1 (A-W): Structure, Experimental and 
Predicted Activity of Quinazolines Used in 
Training and Test Set using model 1 

Table lA: 





Index 








^ 

Residual 




R 


(MM) 


Exp. 


Pred. 


1 


4 




0.921 


6.0357 


5.7791 


0.2566 


2 


5 


(CH^l^OH 


1.643 


5.7843 


6.2187 


-0.4344 


3 


6 




0.402 


6.3957 


5.8729 


0.5228 


4 


7 


CiCHgl^OH 


1.362 


5.8658 


6.009 


-0.1432 
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Table 1 B: 




9 8T =CH H 0.126 6.8996 6.3894 0.5102 
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R3 ICgo' (mM) Residual 

Exp. Pred. 

GCH3 1.00 6.00 6.0671 -0.061 

H 0.320 6.4948 7.0614 -0.5666 

GCH3 0.058 7.2365 7.0613 0.1752 

OCH3 0.010 8.000 8.4797 -0.4797 

H 0.086 7.0655 7.3682 -0.3027 

OCH3 0.029 7.5376 7.0593 0.4783 





www.jbclinpharm.com 



157 



Vol-001 lssue-003 June 2010-August2010 



Malleshappa N. Noolvi and Harun M. Patel 



Journal of Basic and Clinical Pharmacy 



Br Nn 


IndBx 
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n 


IP a 




Rociiiiisl 
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Exp. Pred. 
















22 


22 


T 
0 


0.18 


6.7447 6.8724 


-0.1277 



23 23T 



CH3 

N 



Y 

O 



0.50 6.3010 7.5469 -1.2459 



24 24 



CH3 

T 




NO, 



0.77 6.1135 6.0487 0.0648 



25 25 



CH3 

o 



O 



0.63 6.2006 6.4705 -0.2699 
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Table 1F: Continued 








IV (MM) 




Residual 






Exp. Pred. 





o 

A, 



29 121 HX-^ N 




OCH, 



0.873 6.0589 5.4980 0.5889 




30 122 "N 



31 123 O^^^^^J^ 

32 124 



XH3 



9.601 5.0176 4.8863 0.1313 



0.671 6.1732 6.6764 -0.504 



M 0.066 7.1804 6.9526 0.2278 

XH3 



Table 1G: 



H3C0 



H3C0 



ICs„' (mM) Residual 
Exp. Pred. 



33 33 



34 36^ 



35 37 



36 26 



37 27 




H3C 



H3C. 



XH3 



r 

.N^CHa 



0.014 7.8538 7.5036 0.8178 



0.013 7.8860 7.783 0.103 



0.033 7.4814 7.3368 0.1446 



1.200 5.9208 6.1254 -0.2046 



0.200 6.6989 7.5001 -0.8012 



www.jbclinpharm.com 



159 



Vol-001 lssue-003 June 2010-August2010 



Malleshappa N. Noolvi and Harun M. Patel 



Journal of Basic and Clinical Pharmacy 




Vol-001 lssue-003 June 2010- August 2010 



160 



www.jbclinpharm.com 



Journal of Basic and Clinical Pharmacy 



3D QSAR Studies on Quinazoline Derivatives: The kNN Approach 




www.jbclinpharm.com 



161 



Vol-001 lssue-003 June 2010- August 2010 



Malleshappa N. Noolvi and Harun M. Patel 
Table 1k: Continued 



Journal of Basic and Clinical Pharmacy 















^r.No. 


Index 


R 


IV (MM) 




Residual | 










Exp. Pred. 





56 101 



57 



99 



58 100 



59 102 



60 103 




0.068 7.1674 6.9495 0.2179 



0.104 6.9829 7.1621 -0.1792 



0.830 6.0809 7.0556 -0.9747 



0.074 7.1307 6.9241 0.2066 



0.074 7.1307 7.1457 -0.015 



Table 1 L: 




|lr.No. 


Index 




* IV (MM) 




Residual | 




Exp. 


Pred. 



61 



62 



63 



64 



65 



95 



93 



94 



92 



96 



CI 






N 




N 



0.011 7.9586 7.3971 0.5615 



0.020 7.6989 8.0204 -0.3215 



0.093 7.0315 7.1333 -0.1018 



0.027 7.5686 7.1199 0.4487 



0.024 7.6197 7.9548 -0.3351 
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Table 10: Continued 
Index 




Rg R3 R4 IC50MMM) Residual 

Exp. Pred. 



5 



82 112 



H3C 
H3C. 



H3C. 
.0 




0CH3 0CH3 2* 13 ^-^^^^ ^-^sos 



o 
H3C 



83 114 



H3C 
H3C 



.0 

o 




-0.5325 



OCH3 H H 0.10 7.000 7.1685 -0.1685 



Table 1 P: 





^ Molecules 


IV (MM) 




Residual 










Exp. Pred. 





84 115 




XH3 



CH3 
CH3 



0.30 6.3010 6.9534 -0.6524 



H3C 
H3C. 




N 



N 



85 116 



CH3 

o. 




CI 



H3C 




0.01 8.000 8.2981 -0.2981 



N 



N 



Table 1Q: 




X 








■.No. Index 






IV (MM) 




Residual 






Exp. Pred. 



86 



87 



118T 



cu 



Br 



CI 






CI. 



Cl^ 



H 



0.2 



0.01 



6.698 7.061 



-0.363 



0.026 

hi 



8.00 6.385 1.615 



7.585 7.061 0.524 
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Table 1S: Continued 





Index 




IV (MM) 




Residual 


R 










Exp. Pred. 






6.0506 6.0381 0.0125 



6.2009 5.9567 0.2442 



6.3009 6.5277 -0.2268 



6.4049 6.3490 0.0559 



6.1804 7.0103 -0.8299 



8.000 8.3896 -0.3896 



6.0861 6.3725 -0.2864 



6.2054 6.1541 -0.3356 



6.1366 6.0857 0.0509 



6.3040 6.7578 -0.4538 



6.7695 6.8587 -0.0937 



108 80^ N — 0-72 6.5421 6.3405 0.2016 
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Table 1S: Continued 



r.No. Index R ICggMMM) Residual 

Exp. Pred. 




6.4422 6.5112 -0.069 



6.0409 6.3538 -0.3129 



8.1938 6.6823 1.5115 



Table 1T: 




»r.No. Index R, R, IC '(mM) Residual 
^ ' ' '° Exp. Pred. 



113 


69T 


OCH3 


OCH3 


0.01 


8.000 


7.9253 


0.0747 


114 


78 


OCOCH3 


OCH3 


0.084 


7.0757 


6.8913 


0.1844 


115 


85^ 


OH 


OCH3 


0.39 


6.4089 


6.3825 


0.0264 


116 


86 




OCH3 


0.3 


6.5228 


6.4423 


0.0805 


117 


87 


CH3 


OCH3 


0.027 


7.5686 


7.9773 


-0.4087 
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Table 1U: Continued 





Index 


R 


IV (MM) 




Residual 


Exp. Pred. 


119 


68 




o 

H 


0.02 


7.6989 7.5209 


0.178 



Table IV: 





Jjdex 










Residual 




R 


Rg R3 


Exp. 


Pred. 



120 35T 



OC2OCH3 OC2OCH3 0.0018 8.7447 6.6367 2.108 



121 34 



122 29 




OC2OCH3 OC2OCH3 0.012 7.9208 7.930 -0.0092 



OCH, 



OCH3 0.026 7.5850 7.4378 0.1472 



Table IW: 



Sr.No. Index 




Residual 



123 


63 






0.008 


8.0969 


7.0047 


1.0992 


124 


64 






0.023 


6.6382 


7.4201 


-0.7819 


125 


65 






0.038 


7.4202 


6.6383 


0.7819 


126 


66 


H 




0.010 


8.000 


8.2915 


-0.2915 



Expt. = Experimental activity, Pred. = Predicted activity 

a = Compound concentration required to inhibit tumor cell proliferation by 50% 

b = -Log (IC50 * 10 -6): Training data set developed using model 1 

T Test Set 
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C] Biological activities 

126 quina^oline derivatives having different substi- 
tution were divided into two sets, 98 (75%) mole- 
cules were taken for the training set and 28 (25%) 
compounds were taken in for the test set. IC50 (pM) 
values for EGFR inhibition were transformed into 
-log (IC50^ 10-6) i.e. pIC50 [30]. Since some com- 
pounds exhibited insignificant/no inhibition, such 
compounds were excluded from the present study. 
All the IC50 values had been obtained using the in 
vitro MTT assay method [31, 32]. The IC50 values 
of reference compounds were checked to ensure that 
no difference occurred between different groups. The 
pIC50 values of the molecules under study spanned 
a wide range from 5 to 9. 

D] Data Set 

All computational work was performed on Apple 
workstation (8-chore processor) using Vlife MDS 
QSAR plus software developed by Vlife Sciences 
Technologies Pvt Ltd, Pune, India, on windows XP 
operating system . All the compounds were drawn 
in Chem DBS using fragment database and then 
subjected to energy minimisation using batch energy 
minimisation method. 

E] Molecular Modeling and Alignment 

Conformational search were carried out by systemic 
conformational search method and lowest energy 
conformers were selected. All the compounds were 
aligned by template based method. The selection of 
template molecule for alignment was done by consid- 
ering the following facts: a) the most active compound; 
b) the lead or commercial compound; c) the com- 
pound containing the greatest number of functional 
group [33, 34 ]. Generally, the low energy conformer 
of the most active compound is selected as a refer- 
ence [35]. In the present study, all the compounds 
were aligned against minimum energy conformation 
of most active compound no.28 (Figurel) by using 
quinazoline nucleus as template shown in Figure 2. 

F] Selection of Training and Test Set 

The dataset of 126 molecules was divided into train- 
ing and test set by Sphere Exclusion (SE) method 
for model 1, model 2 and model 3 having dissimilari- 
ties values of 8.2, 8.3 and 8.1 respectively with pIC^^ 
activity field as dependent variable and various 3D 
descriptors calculated for the compounds as inde- 
pendent variables. 
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Figurel: Reference molecule (28) used for 
alignment by template based alignment 




G] Cross- Validation Using Weighted K-Near- 
est Neighbor. 

This is done to test the internal stability and predic- 
tive ability of the QSAR models. Developed QSAR 
models were validated by the following procedure: 

a) Internal Validation. 

(1) A molecule in the training set was eliminated, 
and its biological activity was predicted as the 
weighted average activity of the k most similar 
molecules (eq. 1). The similarities were evaluated 
as the inverse of Euclidean distances between mol- 
ecules (eq. 2) using only the subset of descriptors 
corresponding to the current trial solution 

wi= ^^P^-^j^ 
lExp (-dj) 

k -Nearest neighbor 

y< = llwiy, (1) 

(2) Step 1 was repeated until every molecule in the 
training set has been eliminated and its activity pre- 
dicted once. 

(3) The cross-validated (q^) value was calculated 
using eq. 3, where yi and y are the actual and pre- 
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dieted activities of the ith molecule, respectively, and 
ymean is the average k-Nearest neighbor activity of 
all molecules in the training set* Both summations 
are over all molecules in the training set* Since the 
calculation of the pairwise molecular similarities, and 
hence the predictions, were based upon the current 
trial solution, the q2 obtained is indicative of the pre- 
dictive power of the current kNN-MF A model 



Tiy-y y 

/ I ^-z I y mean / 



(3) 



b) External Validation. 

The predicted (pred_r^) value was calculated using 
eq 4, where yx and are the actual and predicted 
activities of the fth molecule in test set, respectively, 
and 3/ mean is the average activity of all molecules in 
the training set* Both summations are over all mol- 
ecules in the test set* The pred_r^ value is indicative 
of the predictive power of the current kNN-MFA 
model for external test set* 



pred_r^ = 1 - 



Ziyr 



■y ) 

y mean / 



(4) 



Both summations are over all molecules in the 
test set* Thus, the pred_r^ value is indicative of the 
predictive power of the current model for external 
test set* 

c) Randomization Test. 

To evaluate the statistical significance of the QSAR 
model for an actual data set, we have employed a 
one-tail hypothesis testing [36-37]* The robustness 
of the QSAR models for experimental training sets 
was examined by comparing these models to those 
derived for random data sets* Random sets were 
generated by rearranging biological activities of the 
training set molecules* The significance of the mod- 
els hence obtained was derived based on calculated 
Z score [36-37]* 



Zscore = 



a 



(5) 



where h is the value calculated for the SlCXmA 
dataset, p the average c^, and a is its standard 
deviation calculated for various iterations using 
models build by different random datasets* The 
probability (a) of significance of randomization test 
is derived by comparing Z score value with Z score 



Figure 3: Hierarchical Graph Showing Uni 
form Distribution of Training and Test Set 




Training anii Ttstt Com pounds 



Figure 4: Graph of Actual vs. Predicted 
activities for training and test set molecules 
from the kNN-MFA model 1, A) Training set 
(Blue dots) B) Test Set (Yellow dots) 
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Figure 5: 3D- alignment of molecule with the 
important steric and electrostatic point con- 
tributing to the model with range of values 
shown in parenthesis 
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critical value, if Z score value is less than 4*0; oth- 
erwise it is calculated by the formula as given in the 
literature* For example, a Z score value greater than 
3*10 indicates that there is a probability (a) of less 
than 0*001 that the QSAR model constructed for 
the real dataset is random* The randomisation test 
suggests that all the developed models have a prob- 
ability of less than 1% that the model is generated by 
chance* 

EXPERIMENTAL 

All the one hundred twenty six compounds were 
built on workstation of molecular modeling soft- 
ware VlifeMDS, which is a product Vlife Sciences 
Pvt Ltd*, India [38]* We hereby report the models, 
as generated by kNN-MFA in conjunction with 
stepwise (SW) forward-backward variable selection 
methods shown in Table 2* 

In the present kNN-MFA study, (-13*2343 to 
19*1320) X (-12*0268 to 15*04940) x (-11*2513 to 
15*4959) A^grid at the interval of 2*00 was gener- 
ated around the aligned compounds* The steric and 
electrostatic interaction energies are computed at 
the lattice points of the grid using a methyl probe of 
charge +1 of Gasteiger-Marsili type* These interac- 
tion energy values are considered for relationship 
generation and utilized as descriptors to decide near- 
ness between molecules* The QSAR models were 
developed using forward-backward variable selec- 

Table 2: Stastical Results of kNN-MFA method 



Parameters 



n 
k 

q2 

pred_r2 
pred_r2se 

Z score 
best_ran_q2 
a_ran_q2 

Descriptors 



Vn 
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tion method with pIC^^ activity field as dependent 
variable and physico-chemical descriptors as inde- 
pendent variable having cross-correlation limit of 1, 
0*8 and 0*9 for mode 1, model 2 and model 3 respec- 
tively* Selection of test and training set was done by 
sphere exclusion method having dissimilarity value 
of 8*2, 8*3 and 8*1 for mode 1, model 2 and model 3 
respectively* Variance cut off point was 0*0* Num- 
bers of maximum and minimum neighbors were 5 
and 2 respectively* 

The method described above has been imple- 
mented in software, Vlife Molecular Design Suite 
(VlifeMDS), [38] which allows user to choose 
probe, grid size, and grid interval for the generation 
of descriptors* The variable selection methods along 
with the corresponding parameters are allowed to be 
chosen, and optimum models are generated by maxi- 
mizing q^. 

Steps Involved In kNN-MFA Method 

1* Molecules are optimized before alignment optimi- 
zation is done by MOPAC energy minimization and 
optimization is necessary process for proper align- 
ment of molecules around template* 
2* kNN-MFA method requires suitable alignment of 
given set of molecules, alignment are template based* 
3* This is followed by generation of common rectan- 
gular grid around the molecules, the steric and electro- 
static interaction energies are computed at the lattice 




Model 1 
(Dissimilarity value =8.2) 

98 
2 

0.8463 

0.8029 

0.6658 

15.25775 

0.03060 

0.0001 

EJ882 (6.6335, 8.2052) 
S_1462 (30.0000, 30.0000) 
E_2289 (-1.3653,-1.1622) 
E_2287 (0.1633, 0.2380) 
E_261 5 (-0.0583, 0.0815) 
E_2874 (0.0617, 0.0680) 

06 



Model 2 
(Dissimilarity value =8.3) 

98 
2 

0.7487 
0.7192 
0.7126 
11.5635 
0.1225 
0.001 

EJ 882 (6.6335, 8.2052) 
EJ515 (-0.1446, 0.2255) 
S_2892 (-0.0329, -0.0067) 
SJ 462 (30.0000, 30.0000) 
S_734 (-0.0194 -0.0098) 

05 



Model 3 
(Dissimilarity value =8.1) 

98 
2 

0.7802 
0.7412 
0.6533 
14.23195 
0.1092 
0.001 

SJ099 (11.5811, 12.2127) 
E_2289 (-1.6082,-0.7847) 
S_1631 (-0.3820,-0.2460) 
E_1911 (-3.9843,-33125) 
E_2272 (-0.3129,-0.1297) 
S_512 (-0.0214,-0.0165) 

06 
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points of the grid using a methyl probe of charge +1* 

4. The optimal training and test set were generated 
using sphere exclusion method* 

5. Model was generated by various kNN methods, 
and models validated internally and externally by 
leave one out, external validation* 

6. Predict the activity of test set of compounds* 

Since the final equations are not very useful to 
represent efficiently the kNN-MFA models, 3D 
master grid maps of the best models are displayed* 
They represent area in space where steric and elec- 
trostatic field interactions are responsible for the ob- 
served variation of the biological activity* 

RESULT AND DISCUSSION 

Training set of 98 and test set of 28 quinazoline 
derivatives having different substitution were em- 
ployed* Following statistical measure was used to 
correlate biological activity and molecular descrip- 
tors: n = number of molecules, Vn= number of 
descriptors, k = number of nearest neighbor, df = de- 
gree of freedom, r^ = coefficient of determination, 
q2 = cross validated r^ (by the leave-one out method), 
pred_r^ = r^ for external test set, pred_r^se = coef- 
ficient of correlation of predicted data set, Z score 
= the Z score calculated by q2 in the randomisation 
test, best_ran_q^ = the highest q^ value in the ran- 
domisation test and a = the statistical significance 
parameter obtained by the randomisation test* 

Selecting training and test set by spherical exclu- 
sion method, Unicolumn statics shows that the max 
of the test is less than max of train set and the min 
of the test set is greater than of train set shown in 
Table 3, which is prerequisite analysis for further 
QSAR study* The above result shows that the test 
is interpolative i*e* derived within the min-max range 
of the train set* The mean and standard deviation of 
the train and test provides insight to the relative dif- 
ference of mean and point density distribution of the 
two sets* In this case the mean in the test set higher 
than the train set shows the presence of relatively 



Table 3: Unicolumn Statics of Training and 
Test Set 



1 Unicolumn 
ptatics 


Average 


Max 


Min 


Std. Deviation 


For Training Set 


6.4542 


8.7447 


4.6985 


1.1079 


For Test Set 


6.8568 


8.1938 


5.3872 


0.7417 
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more active molecules as compared to the inactive 
ones* Also the similar standard deviation in both set 
indicates that the spread in both the set with their 
respective mean is comparable. 

The activity distribution graph shows the com- 
parison between the activity of training and test set. 
It can be observed from Hierarchical Graph that 
the test set molecule activities lie within the range of 
training set, shown in Figure 3. 

The observed and predicted pIC^^ along with 
residual values for model 1 are shown in Table 
l(A'W). The plot of observed vs. predicted activity 
is shown in Figure 4 From the plot it can be seen that 
kNN'MFA model is able to predict the activity of 
training set quite well (all points are close to regres- 
sion line) as well as external 

During the kNN-MFA investigation, dissimi- 
larity value for the selection of training and test 
by spherical exclusion method of range 8.0000 to 
9.5000 were investigated. The dissimilarity value 
of 8.200 produced a significant result as compare 
to the 8.100 and 8.300 shown in Table 2. Further 
increase in resolution have produced decrease in 
model quality. From the Table 2 it was observed 
that the results were less sensitive to resolution of 
dissimilarity value. 

It is known that the CoMFA method provides 
significant value in terms of a new molecule design, 
when contours of the PLS coefficients are visualised 
for the set of molecules. Similarly, the kNN-MFA 
models provide direction for the design of new mol- 
ecules in a rather convenient way. The points which 
contribute to the kNN-MFA models 1 is displayed 
in Figure 5. The range of property values for the cho- 
sen points may aid in the design of new potent mol- 
ecules (Figure 5). The range is based on the variation 
of the field values at the chosen points using the most 
active molecule and its nearest neighbor set. 

The q2, pred_r2, Vn and k value of kNN-MFA 
with model 1, 2 and 3 were (0.8463, 0.8029, 06/2) 
(0.7487, 0.7192, 05/2) and (0.7802, 0.7412, 06/2) 
respectively. Among these three methods, model 1 
have better q2 (0.8463) and pred_r2 (0.8029) than 
other two models, model 1 correctly predicts activity 
84.63% and 80.29% for the training and test set re- 
spectively. It uses 1 steric and 5 electronic descrip- 
tors with 2 k nearest neighbor to evaluate activity of 
new molecule. 

The model is validated by a_ran_q^ = 0.0001, 
best_ran_q^ = 0.03060, and Z score_ran_q^ = 
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15*25775 *The randomisation test suggests that the 
developed model have a probability of less than 1% 
that the model is generated by chance* 

The kNN MFA models obtained by using all the 
three dissimilarity values showed that electrostatic 
and steric interactions plays major role in determin- 
ing biological activity*S_1462 in model 1, S_1462, 
S_2892, S_734 in model 2 and S_1099, S_1631, 
S_512 in model 3 are steric field descriptors simi- 
larly E_1882, E_2289, E_2287, E_2615, E_2874 in 
model h E_1515, E_1882 in model 2, and E_2289, 
E_1911, E_2272 model 3 are electrostatic field de- 
scriptors* It can also be noted that electrostatic de- 
scriptor E_1882 and steric descriptor S_1462 was 
common in the Model 1 and Model 2 using forward- 
backward variable selection method implying the 
significant role of these descriptors in electrostatic 
and steric field interaction for the structure activity 
relationship* 

Negative value in electrostatic field descriptors 
indicates that negative electronic potential is re- 
quired to increase activity and more electronega- 
tive substituents group is preferred in that position, 
positive range indicates that group that imparting 
positive electrostatic potential is favorable for activ- 
ity so less electronegative group is preferred in that 
region* Similarly negative range in steric descriptors 
indicates that negative steric potential is favora- 
ble for activity and less bulky substituents group is 
preferred in that region, Positive value of steric de- 
scriptors reveals that positive steric potential is favo- 
rable for increase in activity and more bulky group is 
preferred in that region* n, number of observations 
(molecules); Vn, number of descriptors; k, number 
of nearest neighbors; q2,cross-validated r2 (by the 
leave-one out method); pred_r2, predicted r2 for the 
external test set; Zscore, the Zscore calculated by q2 
in the randomization test; best_ran_q2, the highest 
q2 value in the randomisation test and a _ran_q2, 
the statistical significance parameter obtained by the 
randomization test* 

CONCLUSION 

In conclusion, the model developed to predict the 
structural features of quinazoline to inhibit EGFR 
tyrosine kinase, reveals useful information about the 
structural features requirement for the molecule* In 
all three optimized models. Model 1 is giving very 
significant results* The master grid obtained for the 
various kNN-MFA models show that negative value 
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in electrostatic field descriptors indicates the negative 
electronic potential is required to increase activity and 
more electronegative substituents group is preferred 
in that position, positive range indicates that the 
group which imparts positive electrostatic potential 
is favorable for activity so less electronegative group 
is preferred in that region* Negative range in steric 
descriptors indicates that negative steric potential 
is favorable for activity and less bulky substituents 
group is preferred in that region. Positive value of 
steric descriptors reveals that positive steric potential 
is favorable for increase in activity and more bulky 
group is preferred in that region* On the basis 
of the spatial arrangement of the various shapes, 
electrostatic and steric potential contributions 
model proposed in this work is useful in describing 
QSAR of quinazoline derivatives as EGFR tyrosine 
kinase inhibitor and can be employed to design new 
derivatives of quinazoline with specific inhibitory 
activity* 
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